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Abstract — For a certain class of functions, tlie distribution 
of the function values can be calculated in the trellis or a 
sub-trellis. The forward/backward recursion known from the 
BCJR algorithm [1] is generalized to compute the moments of 
these distributions. In analogy to the symbol probabilities, by 
introducing a constraint at a certain depth in the trellis we obtain 
symbol moments. These moments are required for an efficient 
implementation of the discriminated belief propagation algorithm 
in [2], and can furthermore be utilized to compute conditional 
entropies in the trellis. 

The moment computation algorithm has the same asymptotic 
complexity as the BCJR algorithm. It is applicable to any 
commutative semi-ring, thus actually providing a generalization 
of the Viterbi algorithm [3]. 

Index Terms — Trellis Algorithms, Viterbi Algorithm, BCJR 
Algorithm, Distributions, Moments, Decoding, Complexity 

I. Introduction 

Trellises were introduced into the coding theory literature 
by Forney [4] as a means of describing the Viterbi algorithm 
for decoding convolutional codes. Bahl et al. [1] showed that 
block codes can also be described by a trellis, and Wolf [5] 
proposed the use of the Viterbi algorithm for trellis-based 
soft-decision decoding of block codes. Massey [6] gave a 
graph-theoretic definition of a block trellis and an alternative 
construction of minimal trellises. Forney's paper [7] showed 
that group codes, including linear codes and lattices, have a 
well-defined trellis structure. 

In [8], McEliece investigated the complexity of a general- 
ized Viterbi algorithm which allows efficient computation of 
flows on a code trellis. These results were further generalized 
in [9] and [10]. However, the calculation of flows does not 
fully exploit the capabilities of the trellis (representation): 
For a certain set of functions it is possible to calculate the 
moments of these functions in the trellis. These can be scalar 
or vectorial, as long as they are linear and fulfill a separability 
criterion. 

For iterative decoding of coupled codes, the popular sum- 
product algorithm is used to calculate the symbol probabilities 
of the component codes. These probabilities are exchanged 
between component decoders until a stable solution is found. 
This iterative algorithm works very well for long "turbo", low- 
density parity check (LDPC) and some other codes, obtained 
by concatenation of simple component codes in a special way. 
However, performance becomes poor when utilizing short or 
some good component codes. 

Recently, Sorger [2] showed that iterative decoding is im- 
proved when discriminating code words c by their correlation 
cr^ or cw^ with the received word r or a 'believed' word w, 
respectively. Not only symbol probabilities are considered, but 
also the distribution of these probabilities over the correlation 
value. An efficient algorithm is introduced using the first two 
moments to approximate these distributions. 
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Fig. 1. Symbol Distributions of Con'elation cr 



In this paper we propose algorithms to compute both such 
distributions and their moments in the trellis. 

Example 1: Consider Figure 1 which shows two distribu- 
tions of the correlation function cr^, where c is a code word 
and r is the noisy version of a code word c e C after 
transmission over a memory-less binary symmetric channel 
(BSC). The curves show the distributions for c e Ci(+1) and 
c e Ci(— 1), respectively, where Ci{x) := {c G C : Ci = x} 
denotes the sub-code of C for which the symbol Ci at a given 
position i of each code word equals x € {— 1,+1}. The 
integrals over the distributions equal the symbol probabilities 
P{ci ~ x\r). However, the probability ratio 
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varies significantly over cr^ which can be exploited when 
knowledge on the correlation cr'^ with the transmitted code 
word is available. 

The distributions in Figure 1 can be approximated with their 
moments 
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P(c|r,Q) (2) 



up to a certain order m, where Ec'[.] is the expectation over 
all code words c G C. The distributions will be GAUSSian for 
sufficiently long codes which can be understood by the law of 
large numbers. Hence we can expect the first two moments to 
suffice for a good approximation. 

We present generalizations of the methods in [8] which enable 



us to compute distributions P{cw 



-l|r) and expres- 



sions like Fjc [(cti;^) l^jCij for some word w, whereof 



(2) is a special case, both for hard and soft decision. The 
complexity of the algorithm is of the same order as the 
classically used BCJR algorithm. 

The remainder of this paper is structured as follows. The 
next section contains a review of common terminology in the 
context of trellises. This is extended in Section III, which deals 
with the computation of distributions and their moments in 
a more general frame. In Section IV we will return to the 
original problem by transferring the results of Section III to 
linear block codes and calculate the conditional entropy in the 
trellis. 

II. Definitions 

We deliberately follow to a wide extent the notation and 
style of McEliece. The first paragraph is an excerpt from [8] 
with minor modifications.' 

A trellis T = (V, E) of rank n is a finite-directed graph 
with vertex set V and edge set E, in which every vertex is 
assigned a depth in the range {0, 1, . . . , n}. Each edge is 
connecting a vertex at depth i — 1 to one at depth i, for 
some i G {0, 1, . . . ,n}. Multiple edges between vertices are 
allowed. The set of vertices at depth i is denoted by Vj, so 
that V = Ur=o ^'- ^'^^ u G Vi we write dcpth(u) = i. The set 
of edges connecting vertices at depth i — 1 to those at depth i 
is denoted Ei_i.i, so that E = U"=i ^i-i,i- There is only one 
vertex at depth 0, called A, and only one at depth n, called 
B. If e e E is a directed edge connecting the vertices u and 
V, which we denote by e : w -^ w, we call u the initial vertex, 
and V the final vertex of e and write init(e) = u, fin(c) = v. 
We denote the number of edges leaving a vertex v by p'^{v), 
and the number of edges entering a vertex v by /9~(u), i.e. 

p^{v) = \{e : init(e) = v}| 
p^{v) = |{e :fin(e) =z;}|. 

If u and V are vertices, a path P of length L from w to u is a 
sequence of L edges: P = 6162 • • • e^,, such that imt(ei) = u, 
fin(ei) = V, and fin(ei) = init(ei+i), for i = 1, 2, . . . , L — 1. 
If P is such a path, we sometimes write P : u —^ v for short, 
as well as init(P) = init(ei) and fin(P) = fin(ei). We denote 
the set of paths from vertices at depth i to vertices at depth j 
by Eij. We assume that for every vertex v y^ A,B, there is 
at least one path from A to v, and at least one path from v to 
B. 

Example 2 (Trellis): Figure 2 shows a trellis of rank n = 4 
with edge set E = {a, b, c, d, e, f ,g, h, i, j,k, 1} and vertex 
set V = {A, 1, 2, 3, 4, 5, 6, B}. There are eight paths P : A ^ 
B from A to B. There is p^{l) = 1 edge entering (edge a) 
and (0^(1) = 2 edges (edges c and d) leaving vertex v = 1. 
We assume each edge in the trellis is labeled. Let T = (V, E) 
be a trellis of rank n, such that each edge e S E is labeled 
with a real valued number A(e) € K. We now define the label 
of a path, and the flow between two vertices. 

Definition 1 (Path Labels): The label A(P) of a path P = 
6162 ■ ■ -cl is defined as the product A(P) = A(ei) • A(e2) • 
. . . • A(eL) of the labels of all edges in the path. (Note that 

'in contrast to [8] we restrict our definitions and derivations to the set of 
real numbers. 
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Fig. 2. Trellis of rank n. = 4 with vertex set V = {A, 1, 2, 3, 4, 5, 6, B} 
and edge set E = {a,b, c,d, e, f , g, h, i, j, k, 1} 



the subscript indicates the sequence number rather than the 
edge's depth.) 

Definition 2 (Flow): If u and v are vertices in a labeled 
trellis, we define the fiow r](u, v) from m to w to be the sum 
of the labels on all paths from u to v, i.e.. 



V{u,v)= J2 ^(P) 



In this paper, we only consider operations on the set of 
real numbers with ordinary addition and multiplication as the 
authors are not aware of application for other algebraic struc- 
tures. However, Appendix C briefly shows that the algorithm 
can be transferred to any commutative semi-ring, thus leading 
to a generalization of the Viterbi algorithm [3]. 

Example 3: We continue Example 2. The trellis depicted in 
Figure 2 is the trellis of the (4, 3, 2) single parity check code. 
In the BCJR algorithm, the edge labels A(e) are the channel 
probabilities of the corresponding transitions. 

III. Trellis-Based Computations 
In this section we consider distributions of the type 

V: q^D{q)^ ^ A (P) 

/(P)=g 

for special functions /, i.e., q is mapped to the sum of the 
labels of all paths P with /(P) = q. We present an algorithm 
to calculate these distributions over all paths of a trellis or a 
sub-set of these. Before, however, we develop algorithms to 
calculate the moments 
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Ep(/(p)r-A(p) 

EpA(P) 



and - by introducing a constraint on the paths - the symbol 
moments 



^ (/(p)r-A(p) 
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of such distributions in the trellis. We show that the complexity 
of the moment calculation algorithm is 0(|E|), where |E| is 
the number of edges in the trellis. 

To each edge e G E of the trellis T we introduce a second 
label c(e) e M, which we will refer to as the c-label. For 
distinction, we will call A(e) the X-label. 



Example 4: We continue Example 3. Solid lines correspond 
to the c-label c(e) = 1, dashed lines correspond to c(e) = — 1 
(bipolar binary notation). E.g., the path P = adik has the 
c-label c(P) = +1 —1—1+1 which is a code word. 
Let 

gi{c{e)) ■.xi-^y;x,yeR 

be a common function of c(e) for all edges e G IEi-i,i. Further, 
let 

/ (c(P)) = / (c(ei), c(e2), . . . , cicL)) : c ^ y; c(e,), y £ K 

be a function of the c-labels of the edges of a path P with 
length L. The bold letter indicates that c is a vector For 
simplicity, in the following we will abbreviate gi (c(e)) and 
/(c(P)) by gi{e) and /(P), respectively. The functions /(P) 
have to fulfill the linearity criterion 



/(P) = /(eie2---e„) = gi(ei) +.92(62) + ' 



,9n(e„) (3) 



for all paths P : A^ B. 

Definition 3 (Forward Numerator): We define the Tn-th/or- 
ward numerator of a function / at vertex w of a trellis T as 



P:A^t) 



«(»)(„):= ^ (/(P)r-A(P) 
with initial values 



(4) 



m > 



Theorem 1 (Forward Recursion): The TTi-th forward nu- 
merator a('")(z;) of a vertex w G K; on depth i can be 
recursively calculated on a trellis T by 



a(-)(^;)= ^ A(e) • ^ 

e:fin(e)— i; 



/-O 



(g.(e))'.a(™-')(init(e)) 
(5) 



as in Algorithm 1. 



Alg 


orithm 1 Computation of first (M+l) Forward Numerators 


01 


/* initialization */ 


02 


a(0)(A)==l; 


03 


for (m=l to m_max) 


04 


„('") (^) :^ 0; 


05 


/* recursion */ 


06 


for (i=l to n) { 


07 


for (i-eV,) { 


08 


for (m=0 to m_max) 

771 


09 


aM(„)^ ^ A(e)-g(7)(5.(e)/- 




e:fin(e)=« 1=0 




.^(m-i) (init(e)) ; 


10 


} 


11 


} 


12 


} 



Proof: The proof is by induction on depth(w). For 
dcpth(?j) — 1, it follows from the definition of a trelUs that 
all paths from A to u must consist of just one edge e, with 
init(e) = A and fin(e) = v. Thus the true value of «''")(«) is 
the sum of the A-labels on all edges e joining A to v, weighted 



by (gi(e))'". On the other hand, when the algorithm computes 
Q,(™) (y"^ on line 9, the value it assigns to it is (because of the 
initiaUzation a("'(A) = 1, a(™'(w) = for m > 0) 



aM(^;) = ^ A(e) • ^ 



1=0 



(9»(e)) -a 



' . ^(™-o 



(init(e)) 



e:fin(e)— v 

= ^ A(e).(5i(e)r-1 



e:J\ — >v 



which is, as required, the sum of the labels on all edges e 
joining A to v, weighted by (51(e))™. Thus the algorithm 
works correctly for all vertices v with depth(u) = 1 and any 
VI > 0. 

Assuming now that the assertion is true for all vertices at 
depth i or less and all m < M, a vertex v at depth i + 1 is 
considered. When the algorithm computes a^"^\v) on line 9, 
the value it assigns to it is 



a'-"'\v) 



^ ^w-Ew 



e:iin{e)=v 



1=0 



{g^ie)) -a 



'.^("-0 



'init(e)) 



(6) 



But depth(init(e)) = i and so by the induction hypothesis 

a('")(imt(e))= ^ A(P) • (/(P))" . (7) 

P:A^init(e) 

Combining (6) and (7), we have 



e:fin(e) — If 



«""'(«) = E ^(«) ■ E (9,ie)y ■ E ^(p) ■ (/(p))'""' 



P:A^init(e) 

m 



= E E^(e)-^(p)-E(7)(3^(^))'-(/(p))' 

e:fin(e) = uP:A^init(e) 1 = \ / 

Using the binomial theorem we obtain 

«^"H^')= E E A(Pe).(/(P)+.g,(e))™. (8) 

e:fin(e) — u P:A — ^init(e) 

But every path from A to w must be of the form Pe, where P 
is a path from ^ to a vertex u with dcpth(w) = i, init(e) = u 
and fin(e) = v. Thus by (8), a^"'^''{v) is correctly calculated 
by the algorithm. ■ 

Remark 1 (Flow): a'"-* {v) in (4) is the flow r/(yl, v) from A 
to V (cf. Definition 2) as it is calculated by the BCJR algorithm. 

Remark 2: f and gi do not necessarily have to be scalars. 
Theorem 1 holds for all separable linear functions / fulfilling 
Equation (3). 

Theorem 2 (Complexity): The proposed moment comput- 
ing algorithm requires 0(|E|) arithmetic operations, i.e. mul- 
tipUcations and additions. 

Proof: The calculation of the powers of gi{e) up to 
a maximum moment AI for all edges e e E requires |E| • 
max(A'/ — 1,0) multiplications and no additions. We do not 
consider the operations needed for calculating gi{e) here. The 
execution of the sum term over I in line 9 of the algorithm 
requires m additions, 2m + 1 multiplications for to > ^ and 
no multiplications for to — 0. Therefore line 9 requires 

p~ [v) • [1 + 2to + 1] = p" {v) • 2(to + 1) 

-For Z = 0, (gi(e)) = 1 and thus only one multiplication is necessary. 



multiplications for ?ti > 0, p~{v) multiplications for to = 0, Theorem 3 (Backward Recursion): The TO,-th backward nu- 

and p~{v) — I + p~{v) ■ m additions. Hence, for a vertex merator (3^"'^v) of a vertex u G Vj can be calculated in a 

V €Vi, trellis T by 

p-{v)+Y,p-{v)-2im + l) = p-{v)-{M'+3M+l) /3M(z;) = ^ A(e).^ ("M (.9,+i(e))' •/?("-') (fin(e)) ■ 

'" = 1 e:init(e)=D ^=0 ^ ^ 

multiplications and Proof: The proof is analog to the proof of Theorem 1. 

" _ _ /I 3 \ " 

E (p'(«)-(m + l)-l) =p-W-( 2^^'+ 2^^^ + ^)-(^'^+^) It Obviously holds that a('")(i3) = /3('")(A) =: ^^(n 

providmg the »i-th moment 
additions are necessary. The total number of multiplications 

required by the algorithm is thus y^ (f(P))™-A(P) 

mult = (M2 + 371/ + 1) • ^ ^ p-iv) (9) ^ ^ ■ 61(0) (T) ^ A(P) 



and the total number of additions is 



of the distribution of function / given T. 



add = y y p-{v)-^M' + ^M + l]-{M-l)] In analogy to the BCJR algorithm [1] for calculating symbol 

'^-^ ^-^ \ \2 2 / / probabilities, we next consider the calculation of moments of 

^ / introducing a constraint on the value of the c-labels at a 

—AP + -A/ + 1 I • y^ y^ o~ {v\ — certain depth i in the trellis. I.e., the moments are calculated 

^2 2 / 1=1 i,GVi in a sub-trelhs of r. 

n Definition 5 (Symbol Moment): We define the m-th symbol 

-(Af - 1) • ^ ^ 1 . (10) moment n["'\T, x) at depth i of a trelHs T as 

i=l wGVi 

Every edge in E is counted exactly once in the sum in (9), 2^ \J\)) ' \ ) 

since if e : u ^ w, then fin(e) G V^ for exactly one value of n^'^)i^ \ "^"^ 

i G {1,2,..., n\. Thus the sum in (9) is |E|. The second sum * \ ^ ) ■ ■^ 

in (10) is |V| — 1, since every vertex except A is in lj"^i V^. p:a^b 

Thus from (9) and (10), we have • '^"^ 

mult = (A/^ + MI + 1) • |E| where d = c(ei)and e, G Ej_i,i is the i-th edge of path P. 

/I o \ Theorem 4: The m-th symbol moment can be calculated by 

add = -M^ + -A/ + 1 • |E| - (A/ + 1) • (|V| - 1) 



so that the total number of arithmetic operations required by i \ ^^i (o) .„ ^ 

the algorithm is * 



^A/2 + ^M + 2 ) • |E| - [M + 1) • |V| + 71/ + 1 



^r iT,x) = 
with 



< (-A/^ + -A/ + 2).|E| 



c(e)= ' 



We have |V| > 1, and |E| - |V| + 1 > (since the trellis is ^ (^\ k (i-k) ■ 

connected), so that the total number of operations required is ' 2^ \if) ^Si\^)) ' o: (init(ej j. (11) 

bounded above by (|A/2 + |A/ + 2) • |E| and bounded below ^==0 

by (f A/^ + |A/ + 1) • |E| (disregarding the complexity of the Proof: Let Ph and Pt denote the head and tail parts of 

computation of gj(e)). ■ the paths P : A ^ B through the trellis T, with an edge e 

in between, i.e., P = Ph^Pt with init(P^) = A, fin(PH) = 
init(e), fin(e) = init(PT) and fin(PT) = B, for a given depth 
i and e G Ei_i i. Then we can write 



In analogy to the forward numerator in Definition 3 we can 
also define a backward numerator. 

Definition 4 (Backward Numerator): The TO,-th backward 
numerator of a vertex f G Vi is defined as 



d:r\T^x)= Y. imr-m 



pi^-\v):= Y. (/(p)r-Mp) 

With initial values = E E E Ui^H)+9^{e) + fiPx))' 



(,„) I 1 : TO = e(e,.. .nU,.) -.S 

'^ ^ '^ ^ : m>0 ■ •A(PHePT) 



Applying Bayes' rule twice and separating the A-labels we 
obtain 



c(e) = x 



;=o 



•E (I) (5.(e))'-(/(PH))'-' •>(?«) 

and using the definitions of forward and backward numerators 
finally yields the assertion of the theorem. ■ 

Theorem 5 {Computational Cotnplexity): Given the for- 
ward numerators a*^™^(w) and the backward numerators 
(3^"^\v) up to order m for all v £ Y, the computation of 
ri^" (r, cc) for all z e l...n requires 0(|E|) arithmetic 
operations. 

Proof: Consider Equation (11). The sum over k requires 
21 multiplications and I additions. The sum over I requires 



E(2^ + 2) - 1 = m^ + 3?7i + 1 
multiplications and 



E 

1=0 



I + (m - 1) = - (m^ + 3m - 2) 



additions. There are at most |Ei_i i| edges e for which e S 
Ei_i_i and c(e) = x, thus the sum over these edges requires 
at most |Ei_i_i| • ((m^ + 3to+ 1) + l) multiplications and 
|Ej_i^j| • i (m^ + 3771 - 2) + |Ei_i,i| - 1 additions. As we 
calculate the symbol moments for all i € 1 ... n, we can finally 
upper limit the requirements by 

mult < |E| • (m^ + 3777 + 2) 



add < lEl • 0.5 m^ 



3m) 



Remark 3 (Forward/Backward Moments): For numeric rea- 
sons it may be advantageous to directly compute the forward 
and backward moments 

respectively, and to calculate and carry the 0-th numerators 
(flows) in the logarithmic domain. 

Finally in this section, we describe the calculation of distribu- 
tions over all paths P : ^ — ^ B, or a subset of paths, in the 
trellis in analogy to the calculation of moments and symbol 
moments, respectively. 

Definition 6 (Forward/Backward Distribution): We define 
ihe. forward distribution a^{v) and the backward distribution 
P'^{v) at a vertex v as the mapping functions 



q' 



E A(I 



and 



E MP)' 



fm=q 



Theorem 6: The forward distribution a^{v) at a vertex v G 
Vi can be recursively calculated in the trellis by 

a-^{v)^ E K(init(e))ffl5«(e))-A(e), 

e:fin(e)— ij 

where a{u) ffl b denotes a shift of the domain of the distri- 
bution a{u) by b, and a''^{A) equals the Dirac function. The 
calculation of /3^(f) is analog with 0^{B) being the Dirac 
function. The distribution 9''^{T) and the symbol distribution 
^f{x, T) can be calculated by 

0^(T)==E"''(^')*/9''(«) 



iiGV. 



and 



nf{T, x) - E (""^ ("^it(^)) = 5.(e)) * P^ (fin(e)) • A(e), 



eeEi_i,, 
c(e) = a= 



respectively. Herby, * denotes the convolution operator, i.e. for 
two distributions a{u) and b{u) it holds 



a{u) * b{u) = / a(i/) • b{u — v) dv . 
Proof: Theorem 6 follows directly from Definition 6. ■ 

Remark 4 (Density Distributions): When normalizing dis- 
tributions by the corresponding flow, we obtain density dis- 
tributions. 

Remark 5 (Probability Density Functions): For A(e) being 
probabilities, normalized distributions are probability den- 
sity functions with the mapping /(P) -^ P{f{P)) and 

Remark 6: By Theorem 6, the complexity due to the cal- 
culation on the trellis is in general not reduced (except for 
the hard decision case) as infinite resolution of the domain of 
0^(11) etc. is required.. However, in Appendix B an algorithm 
is introduced which approximates Theorem 6 and does reduce 
complexity. 

Remark 7: We cannot only determine the distribution and 
its moments of a trellis or sub-trellis, but also of a single edge. 

Remark 8: The symbol distribution for two sub-trellises of 
the [7 5]oct convolutional code, namely the sub-codes with the 
i-th code bit Ci = +1 and q = —1, respectively, is given in 
Example 1. The curves obtained by Gaussian approximation 
almost coincide with the ones plotted in Figure 1. 

Remark 9: It is straight forward to extend the proposed 
algorithm to the calculation of joint moments of two or more 
functions. E.g., 

nik,m) .^ Ep(/.(P))''(/.(P)r-A(P) 

^'^ ■ EpA(P) 

can be calculated using 

(k,m)/ \ S~^ /J. /-r^wk 



a\jT'(^) 



E (/.(p))'-(/.(p)r-A(p) 



k rn 

E A(e).EE 

e:fin(e)=« i=0 ;=0 



k\ /m 

J J V I . 



5-7(^)-C W-"&nimt(e)) 



respectively. 



with i = dcpth(u). 



IV. Applications 

We will now apply the results of Section III to linear block 
codes. We compute the moments 



the paths' c-labels, i.e., the function of the code words, be the 
correlation (inner product) of w and c. 



Jic[iH{c\w)r\r,c,^x] 



cec 



{H{c\w))"' P{c\r,Ci ^x) 



f{F) = f{c{P)) = cw^^Y.' 



of the distribution 

V : q ^ H{c\w) t-^ P {q\r^ Ci = x) = >, -P (ck, Q ~ x) 



ceC: 



over all code words c G C given a received word r and the 
/-th code bit being c^ = x € {—1,1}, where 

H{c\w) = -\ogP{c\w) 

is the conditional uncertainty of c given a word w and P{c\r) 
is the conditional probability of c given r. These moments 
are required, e.g., for the discriminated belief propagation 
algorithm in [2]. As a special case we can calculate the 
conditional mean uncertainty or entropy 

H{C\r) = ^H{c\r) ■ P{c\r) 
ceC 
of a code or sub-code given r. 

Both for hard decision (BSC) and soft decision (AWGN 
channel) the conditional uncertainty is linearly related to the 
correlation cw^ (cf. Appendix A), 

H{c\w) ^Ki+K2- cw'^ , (12) 

with Ki and K2 being constant functions of error probability 
and vector w (assuming equiprobable code words). Therefore, 
when applying the binomial theorem. 



E 

cec 



cec (=0 

(=0 ^ ^ 
it is sufficient to calculate the moments 



iH{c\w)r-P{c\r,c,^x) 
= ^ (JsTi + iCa • cw'^)"^ ■ P(c|r, c, = x) 

cGC 

= EE r)Kr-'Ki {cw^yP{c\r,c, = X) 



Ec (cw'^) \r,Ci = x\ = y^ (cti;-^) ■P{c\r,Ci = x) 



E 

cec 

(13) 
of the correlation cw^ on the trellis which will be done in 
the following. 

Consider a binary linear block code C of length n which 
is representable in a trellis, e.g., a terminated convolutional 
code. Let the c-labels c(e) = c; G {±1} be the bipolar 
representation of the code bit labeling edge e. To each path 
P : A — ^ _B it belongs a sequence c(P) of n c-labels 
representing a code word c G C Let r = [rir2 ■ ■ ■ r„], Vi G M, 
be the noisy version of a code word c after transmission over 
a memory-less channel. Let the A-label of a path P be the 
conditional probability of the received word r given the code 
word c, i.e., A(P) = P{r\c). Let further the function / of 



4=1 

Hence, gi{e) = CiWi and the separability criterion (3) is 
fulfilled. In the trellis of C, for each vertex u G V the c-labels 
c(e) of edges {e : init(e) = v} emerging from v are distinct. 
Therefore there is a one-to-one mapping of each code word c 
to a path P in the trellis, and we can apply the theorems of 
Section III replacing ^p by J^c- ^PPlying Bayes' rule to 
(13), 
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and comparing with Definition 5 we observe that Theorems 1 
and 3 hold, and hence these moments can be calculated in the 
trellis according to Theorem 4 as the symbol moments 
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Analogously, when omitting the code bit constraint c^ ~ x, 
the moments are given by 
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For w = r, m = 1 and gi{e) — CiVi we can thus calculate 
the conditional entropies 

H{C\r) = Y^ H{c\r) ■ P{c\r) = K^ + K2 ■ O^'^^T) 



and 

H{C,{x)\r) 



ceCici 



H{c\r) ■ P{c\r) = Ki + K2- n\^\x) 



of the code C and the sub-code Ci{x) = {c E C : Ci = x} 
given r, respectively. While H{C\r) can also be calculated 
with the classical BCJR algorithm as 
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this does not hold for the conditional entropy of Ci{x). 

Remark 10: For a convolutional code with c outputs, to 
each edge in the trellis are assigned c code symbols. To apply 
our definition of a single symbol label per edge, each edge 
e of the original trellis is replaced by a path e'j^Ca • • • e^ of c 
edges which fulfill 

init(e) = init(e]^), fni{ei) = init(e2), . . . , fin(e^) = fin(e) 

and to each edge e^ one code symbol is assigned. 
Example 5: Figure 1 shows the distribution 
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±l|r) over cr^ for the [57]oct convolutional 
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where Kia is a constant assuming equiprobable code words. 
Assuming further that Wi is independent of Cj for i y^ j it 
follows that 

n 
l0g2-P(«'|c) = l0g2 J|P(w,|Cj) 

1=1 
n 

1=1 

• For a binary symmetric channel (BSC) with Wi,Ci G 
{±1} and error probability p the Hamming distance 
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Fig. 3. Distribution and its GAUSSian Approximation 



code of length n = 200 given a noisy received word r after 
transmission over a BSC with bit error probability p = 0.35. 
These are the normalized symbol distributions J7^j^q(±1) 
weighted by the probability P{ci = ±l|r). 

Example 6: Figure 3 shows a distribution of the terminated 
[7 5]oct convolutional code as well the GAUSSian approxima- 
tion given the first two moments for a BSC. 

V. Conclusions 

A treUis represents a general distribution which can be 
marginalized, e.g. with respect to edge labels. Two algorithms 
for computations on the trellis were presented: One allowing 
to calculate distributions, the other to compute their moments, 
allowing to approximate the distributions. The latter was 
derived by generalizing the forward/backward recursion as 
known from the BCJR algorithm. The results were trans- 
ferred to the concrete problem of computing the moments 
of the conditional distribution of the correlation between a 
block code and some given word. The moment calculation 
algorithm is a requirement for efficient implementation of the 
discriminated belief propagation algorithm in [2]. It can also 
be used to calculate the conditional entropy of a code or sub- 
code. Though not the focus of this paper, in the Appendix 
it is shown that the algorithm does not restrict to calculation 
with real numbers, but is valid for any commutative semi-ring, 
thus providing a generalization of the Viterbi algorithm. The 
asymptotic complexity of the moment computation algorithm 
is the same as for the BCJR algorithm. 

Appendix 

A. Relation between Uncertainty and Correlation 

The conditional uncertainty of a code word c given a word 
w is defined as 



H{c\w) := - log2 P{c\w) = - log2 P{w\c) + log2 



P{w) 



between c and w is 
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For an AWGN channel with noise variance a^ we obtain 
(note that P{w\c) actually is the Gauss probability 
density) 
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In either case we can thus express the conditional uncertainty 

as 

H{c\w) = {Kia - Kib) -A'2 • cw'^ , 

I.e., the uncertainty is linearly related to the correlation. 



B. Calculating the Actual Distribution 

For a trellis of rank n and gi{e) E {±1}, which is the case 
for hard decision decoding, the domain of the distributions, 
i.e., the values that /(P) can take, is B = {—n, —n+2, . . . , n— 
2, n} with cardinality |B| = n+1. In this case the distributions 
can be directly implemented as vectors of length ri + 1. A shift 
ffl of the domain is simply a shift of the vector contents, and 
the correlation operation * is discrete. 

In case of soft decision, the domain needs to be quantized. 
For GAUSSian distributions, an efficient way for uniform 
mid-tread quantization is to carry along the mean value /i 
of the distribution and to arrange the partitions equally to 
both sides of it, storing the partition contents in vectors d. 
When extending a path P by an edge e e Ei-ij in the 
forward/backward recursion (lengthening), the domain of /(P) 
is shifted by gi{e), i.e., gi{e) is added to the /i. However, when 
joining paths in a vertex, the mean values of the incoming path 



distributions do usually not coincide. Hence a new mean value 
fincw has to be determined and the partition contents need to 
be distributed. 

Let the vectors d be of length (2A^ + 1), each element 
corresponding to a partition of width Au, . The partitions are 
indexed by j e {^N, —N + 1, . . . , N}, where j = denotes 
the center partition around the mean value. The a mean value 
fincw is the weighted sum of the mean values fi^^ of the 
involved distributions in vectors din. E.g., for the forward 
recursion, 

Mnow = a^(w) := V" (a^ (init(e)) + gi{e)) 
— ^ , ^ 

e:fin(e)— I) 

A(e) • r]{A,mit{e)) 



•- V ' 

relative weight of edge e 

with a^iA) = 0, where ^[^ is the mean value of the 
distribution a^ (init(e)) after lengthening by gi{e), and a^(w) 
is the mean of the forward distribution a'^{v). The final 
distribution vector dncw = a'^{v) is the weighted sum of the 
vectors dout which are calculated by distributing the content of 
the vectors din = a** (init(e)) according to the new partition 
margins with A^, = /.tnow — A'in as follows (cf. Figure 4). 
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c^out = (all-zero vector) 
d, 
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C. Generalization to Calculations on a Semi-ring 

In the main part of this paper, the computation of moments 
in the trellis is introduced for real numbers. However, the 
algorithm is valid for the more general algebraic structure of 
commutative semi-rings. The 0-th forward moment then results 
in the Viterbi algorithm on semi-rings. 

Let the A-label and the c-label come from an algebraic set 
§ which is closed under the two binary operations and 0, 
called addition and multiplication, which satisfy the following 
axioms: 

• The operation is associative and commutative, and 
there is an identity element 1® such that s 1® = 
1® s = s for all s e §, making (S, 0) a commutative 
monoid. 

m The operation © is associative and commutative, and 
there is an identity element 0® such that s © 0® = 
0® © s = s for all ,s e §, making (§,©) a commutative 
monoid. 

m The distributive law (x © y) 2 = (x z) © (y z), for 
all triples {x, y, z) from §. 

• The identity element 0® of the addition annihilates §, i.e., 
0® s = s 0® = 0® for all s e §. 

The triple (§, 0, ©) is called a commutative semiring. 

Let a,b E (§,0,©) be elements of such a commutative 
semiring. We define the following notation: 

'ciQaQ ...Qa m G N 
. ' 

771 = 



_lC«in[j] 



for max i-N, ~N + ^\ + l) < j < min (n, N + I -^ I ' 



- dout [j 

- dout [j 



(1 



e-din[j] 
-e) -dinij] 



where a += b denotes the addition of 6 to a, i.e., a = a + 
b. The forward distribution vector a'^{A) is initialized The 
backward distribution is computed analogously. 

With the two procedures of lengthening and joining the 
mean value of the symbol distribution can be calculated by 



r!f(a;,r)= J2 K(imt(e))+g,(e) + /3^(fin(e))) 

ri{A, mit{e)) ■ \{e) ■ rj{fin{e), B) 

' Ee'e£._, ,:c(e')=. v{A, init(e')) • A(e') • 77(fin(e'), B) ' 
^^ V ' 

relative weight of edge e 

and the discrete symbol distribution vector dout = ^f{^7 T) is 
obtained by convolving the forward and backward distribution 
vectors a'^ (init(e)) and /?'' (fin(e)) for each edge e e Ki^i^i : 
c(e) ~ X, 

din=a'*(init(e))*/3'*(fin(e)), 

followed by a weighted re-distribution of the vector contents 
of the din to dout- 



1® 



a(Sa(S ...(Ba 



0^ 



) a 77 e N 



i=l 



77 = 



with 77 aQb — n (aQb) and N being the set of natural numbers. 
Then the binomial theorem can be written as 



(a © by 



1=0 



a' 6'"^', 777, leNo, a,b€ (S, 0, ©) 



with the binomial coefficient (") e Nq = {N U 0}. In analogy 
to Definition 3 and Theorem 1 we can now define the forward 
numerator and its calculation on a semi-ring. 

Definition 7: We define the 777-th forward numerator of a 
function / G (S, 0, ©) at vertex f of a trellis T as 



(15) 



a^"^>{v):= }ffi A(P)0(/(P))'" 
with initial values 

1® : 777 = 

0® : 777 > ' 
Theorem 7: The m-th forward moment a*-™-* (v) of a vertex 
V E Vi on depth i can be recursively calculated on a trellis T 
and a commutative semiring (§, 0, ©) by 
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Fig. 4. Assignment of Partition Contents of a Quantized Distribution 



for all functions f{P:A^v) and gj, j — I,. . . ,i, which 
fulfill 

/(P) = /(eiea • • • &;) = gi(ei) 52(62) ® . . . gi(e,). (17) 
Proof: The proof is by induction on dcpth(u). For 
dcpth(w) = 1 the algorithm computes 
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A(e)0(l(gi(e))™0l0) 



e:fin(e)— u 



A(e)0(gi(e))' 



e:fin(e)— D 

which is, as required, the sum of the labels on all edges 
e joining A to v, weighted by (51(e))"'. For a vertex v at 
depth i + 1 the value assigned to a(™^(u) is by the induction 
hypothesis 



a(")(u) 
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Using the axioms^ of the commutative semiring (S,0,0) we 
have 
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Applying Equation (17) and the binomial theorem we obtain 
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But every path from A to v must be of the form Pe, where P 
is a path from A to a vertex u with depth(M) = i, init(e) = u 
and fin(e) = v. Hence, a'™-* (v) is correctly calculated by the 
theorem. ■ 

Remark 11: Note that the complexity considerations in 
Theorems 2 and 5 transfer to the calculation on semi-rings. 
However, the terminology of "addition" and "multiplication" 
then refers to the operations and 0. 



■ A Vb Bi = Vb AQ Bi requires distributive law (factor into sum) 

i i 

■ y®, Z®i "^»i ~ /® /® ^»J requires associativity and commutativity 

i i j i 

of © (change order of sums) 
A& B = B Q A requires commutativity of (change order of factors) 



